NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

{StruQ}: Defending against prompt Injection with structured queries

Chen, Sizhe; Piet, Julien; Sitawarin, Chawin; Wagner, David (August 2025, 34th USENIX Security Symposium)

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications, which perform text-based tasks by utilizing their advanced language understanding capabilities. However, as LLMs have improved, so have the attacks against them. Prompt injection attacks are an important threat: they trick the model into deviating from the original application's instructions and instead follow user directives. These attacks rely on the LLM's ability to follow instructions and inability to separate prompts and user data.
more » « less
Full Text Available
Stronger universal and transferable attacks by suppressing refusals

Huang, David; Shah, Avidan; Araujo, Alexandre; Wagner, David; Sitawarin, Chawin (April 2025, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers))

Making large language models (LLMs) safe for mass deployment is a complex and ongoing challenge. Efforts have focused on aligning models to human preferences (RLHF), essentially embedding a “safety feature” into the model’s parameters. The Greedy Coordinate Gradient (GCG) algorithm (Zou et al., 2023b) emerges as one of the most popular automated jailbreaks, an attack that circumvents this safety training. So far, it is believed that such optimization-based attacks (unlike hand-crafted ones) are sample-specific. To make them universal and transferable, one has to incorporate multiple samples and models into the objective function. Contrary to this belief, we find that the adversarial prompts discovered by such optimizers are inherently prompt-universal and transferable, even when optimized on a single model and a single harmful request. To further exploit this phenomenon, we introduce IRIS, a new objective to these optimizers to explicitly deactivate the safety feature to create an even stronger universal and transferable attack. Without requiring a large number of queries or accessing output token probabilities, our universal and transferable attack achieves a 25% success rate against the state-of-the-art Circuit Breaker defense (Zou et al., 2024), compared to 2.5% by white-box GCG. Crucially, IRIS also attains state-of-the-art transfer rates on frontier models: GPT-3.5-Turbo (90%), GPT-4o-mini (86%), GPT-4o (76%), o1-mini (54%), o1-preview (48%), o3-mini (66%), and deepseek-reasoner (90%).
more » « less
Full Text Available
ML-Based Behavioral Malware Detection Is Far From a Solved Problem

https://doi.org/10.1109/SaTML64287.2025.00056

Kaya, Yigitcan; Chen, Yizheng; Botacin, Marcus; Saha, Shoumik; Pierazzi, Fabio; Cavallaro, Lorenzo; Wagner, David; Dumitraş, Tudor (April 2025, IEEE)

Full Text Available
Vulnerability Detection with Code Language Models: How Far Are We?

Ding, Yangruibo; Fu, Yanjun; Ibrahim, Omniyyah; Sitawarin, Chawin; Chen, Xinyun; Alomair, Basel; Wagner, David; Ray, Baishakhi; Chen, Yizheng (April 2025, 47th International Conference on Software Engineering)

In the context of the rising interest in code language models (code LMs) and vulnerability detection, we study the effectiveness of code LMs for detecting vulnerabilities. Our analysis reveals significant shortcomings in existing vulnerability datasets, including poor data quality, low label accuracy, and high duplication rates, leading to unreliable model performance in realistic vulnerability detection scenarios. Additionally, the evaluation methods used with these datasets are not representative of real-world vulnerability detection. To address these challenges, we introduce PRIMEVUL, a new dataset for training and evaluating code LMs for vulnerability detection. PRIMEVUL incorporates a novel set of data labeling techniques that achieve comparable label accuracy to humanverified benchmarks while significantly expanding the dataset. It also implements a rigorous data de-duplication and chronological data splitting strategy to mitigate data leakage issues, alongside introducing more realistic evaluation metrics and settings. This comprehensive approach aims to provide a more accurate assessment of code LMs’ performance in real-world conditions. Evaluating code LMs on PRIMEVUL reveals that existing benchmarks significantly overestimate the performance of these models. For instance, a state-of-the-art 7B model scored 68.26% F1 on BigVul but only 3.09% F1 on PRIMEVUL. Attempts to improve performance through advanced training techniques and larger models like GPT-3.5 and GPT-4 were unsuccessful, with results akin to random guessing in the most stringent settings. These findings underscore the considerable gap between current capabilities and the practical requirements for deploying code LMs in security roles, highlighting the need for more innovative research in this domain.
more » « less
Full Text Available
Meta-synthesis reveals interconnections among apparent drivers of insect biodiversity loss

https://doi.org/10.1093/biosci/biaf034

Halsch, Christopher A; Elphick, Chris S; Bahlai, Christie A; Forister, Matthew L; Wagner, David L; Ware, Jessica L; Grames, Eliza M (April 2025, BioScience)

Abstract Scientific and public interest in the global status of insects has surged recently; however, understanding the relative importance of different stressors and their interconnections remains a crucial problem. We use a meta-synthetic approach to integrate recent hypotheses about insect stressors and responses into a network containing 3385 edges and 108 nodes. The network is highly interconnected, with agricultural intensification most often identified as a root cause. Habitat-related variables are highly connected and appear to be underdiscussed relative to other stressors. We also identify biases and gaps in the recent literature, especially those generated from a focus on economically important and other popular insects, especially pollinators, at the expense of non-pollinating and less charismatic insects. In addition to serving as a case study for how meta-synthesis can map a conceptual landscape, our results identify many important gaps where future meta-analyses will offer critical insights into understanding and mitigating insect biodiversity loss.
more » « less
Full Text Available
Toxicity detection for free

Hu, Zhanhao; Piet, Julien; Zhao, Geng; Jiao, Jiantao; Wagner, David (December 2024, 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024))

Current LLMs are generally aligned to follow safety requirements and tend to refuse toxic prompts. However, LLMs can fail to refuse toxic prompts or be overcautious and refuse benign examples. In addition, state-of-the-art toxicity detectors have low TPRs at low FPR, incurring high costs in real-world applications where toxic examples are rare. In this paper, we introduce Moderation Using LLM Introspection (MULI), which detects toxic prompts using the information extracted directly from LLMs themselves. We found we can distinguish between benign and toxic prompts from the distribution of the first response token’s logits. Using this idea, we build a robust detector of toxic prompts using a sparse logistic regression model on the first response token logits. Our scheme outperforms SOTA detectors under multiple metrics.
more » « less
Full Text Available
Stronger universal and transfer attacks by suppressing refusals

Huang, David; Shah, Avidan; Araujo, Alexandre; Wagner, David; Sitawarin, Chawin (December 2024, Neurips Safe Generative AI Workshop 2024)

Making large language models (LLMs) safe for mass deployment is a complex and ongoing challenge. Efforts have focused on aligning models to human prefer- ences (RLHF) in order to prevent malicious uses, essentially embedding a “safety feature” into the model’s parameters. The Greedy Coordinate Gradient (GCG) algorithm (Zou et al., 2023b) emerges as one of the most popular automated jail- breaks, an attack that circumvents this safety training. So far, it is believed that these optimization-based attacks (unlike hand-crafted ones) are sample-specific. To make the automated jailbreak universal and transferable, they require incorporating multiple samples and models into the objective function. Contrary to this belief, we find that the adversarial prompts discovered by such optimizers are inherently prompt-universal and transferable, even when optimized on a single model and a single harmful request. To further amplify this phenomenon, we introduce IRIS, a new objective to these optimizers to explicitly deactivate the safety feature to create an even stronger universal and transferable attack. Without requiring a large number of queries or accessing output token probabilities, our transfer attack, optimized on Llama-3, achieves a 25% success rate against the state-of-the-art Circuit Breaker defense (Zou et al., 2024), compared to 2.5% by white-box GCG. Crucially, our universal attack method also attains state-of-the-art test-set transfer rates on frontier models: GPT-3.5-Turbo (90%), GPT-4o-mini (86%), GPT-4o (76%), o1-mini (54%), and o1-preview (48%).
more » « less
Full Text Available
Toxicity Detection for Free

Hu, Zhanhao; Piet, Julien; Zhao, Geng; Jiao, Jiantao; Wagner, David (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
MARKMyWORDS: Analyzing and Evaluating Language Model Watermarks

https://doi.org/10.1109/SaTML64287.2025.00012

Piet, Julien; Sitawarin, Chawin; Fang, Vivian; Mu, Norman; Wagner, David (April 2025, IEEE)

The capabilities of large language models have grown significantly in recent years and so too have concerns about their misuse. It is important to be able to distinguish machine-generated text from human-authored content. Prior works have proposed numerous schemes to watermark text, which would benefit from a systematic evaluation framework. This work focuses on LLM output watermarking techniques—as opposed to image or model watermarks—and proposes MARKMYWORDS, a comprehensive benchmark for them under different natural language tasks. We focus on three main metrics: quality, size (i.e., the number of tokens needed to detect a watermark), and tamper resistance (i.e., the ability to detect a watermark after perturbing marked text). Current watermarking techniques are nearly practical enough for real-world use: Kirchenbauer et al.'s scheme can watermark models like Llama 2 7B-chat or Mistral-7B-Instruct with no perceivable loss in quality on natural language tasks, the watermark can be detected with fewer than 100 tokens, and their scheme offers good tamper resistance to simple perturbations. However, they struggle to efficiently watermark code generations. We publicly release our benchmark (https://github.com/wagner-group/MarkMyWords).
more » « less
Full Text Available
Quantitative Analysis of Drugs in a Mimetic Tissue Model Using Nano-DESI on a Triple Quadrupole Mass Spectrometer

https://doi.org/10.1021/jasms.4c00345

Moore, Alyssa M; Bowman, Andrew; Wali, Syeda Nazifa; Weigand, Miranda R; Wagner, David; Yang, Junhai; Laskin, Julia (December 2024, Journal of the American Society for Mass Spectrometry)

Mass spectrometry is a powerful analytical technique used at every stage of the pharmaceutical research process. A specialized subset of this technique, mass spectrometry imaging (MSI) has emerged as an important technique for determining the spatial distribution of drugs in biological samples. Despite the importance of MSI, its quantitative capabilities are still limited due to the complexity of biological samples and the lack of separation prior to analysis. This makes the simultaneous quantification and visualization of analytes challenging. Several techniques have been developed to address this challenge and enable quantitative MSI. One of these techniques is the mimetic tissue model, which involves the incorporation of an analyte of interest into tissue homogenates at several concentrations. A calibration curve that accounts for signal suppression by the complex biological matrix is then created by measuring the signal of the analyte in the series of tissue homogenates. Herein, we use the mimetic tissue model on a triple quadrupole mass spectrometer (QqQ) in multiple reaction monitoring (MRM) mode to demonstrate the quantitative abilities of nanospray desorption electrospray ionization (nano-DESI) and compare these results with those obtained using atmospheric pressure matrix-assisted laser desorption/ionization (AP-MALDI). For the tested compounds, our findings indicate that nano-DESI achieves lower standard deviations than AP-MALDI which contributes to nano-DESI also achieving lower limits of detection (LOD) for the analytes studied. Additionally, we discuss the limitations of the mimetic tissue model in the quantification of certain analytes and the challenges involved with the implementation of the model.
more » « less
Full Text Available

« Prev Next »

Search for: All records